contrib/vmap_extractor_v2/doc/The MoPaQ File Format.txt

   1 THE MOPAQ ARCHIVE FORMAT
   2 v0.9 (Thursday, June 30, 2005)
   3 by Justin Olbrantz(Quantam)
   4
   5 Distribution and reproduction of this specification are allowed without limitation, as long as it is not altered. Quoting
   6 in other works is freely allowed, as long as the source and author of the quote is stated.
   7
   8 TABLE OF CONTENTS
   9 1. Introduction to the MoPaQ Format
  10 2. The MoPaQ Format
  11         2.1 General Archive Layout
  12         2.2 Archive Header
  13         2.3 Block Table
  14         2.4 Hash Table
  15         2.5 File Data
  16         2.6 Listfile
  17         2.7 Extended Attributes
  18         2.8 Weak (Old) Digital Signature
  19         2.9 Strong (New) Digital Signature
  20 3. Algorithm Source Code
  21         3.1 Encryption/Decryption
  22         3.2 Hashing
  23         3.3 Conversion of FILETIME and time_t
  24
  25 1. INTRODUCTION TO THE MOPAQ FORMAT
  26 The MoPaQ (or MPQ) format is an archive file format designed by Mike O'Brien (hence the name Mike O'brien PaCK) at Blizzard
  27 Entertainment. The format has been used in all Blizzard games since (and including) Diablo. It is heavily optimized to be
  28 a read-only game archive format, and excels at this role.
  29
  30 The Blizzard MoPaQ-reading functions are contained in the Storm module, which my be either statically or dynamically linked.
  31 The Blizzard MoPaQ-writing functions are contained in the MPQAPI module, which is always statically linked.
  32
  33 2. THE MOPAQ FORMAT
  34 All numbers in the MoPaQ format are in little endian. Data types are listed either as int (integer, the number of bits specified),
  35 byte (8 bits), and char (bytes which contain ASCII characters). All sizes and offsets are in bytes, unless specified otherwise.
  36 Structure members are listed in the following general form:
  37 offset from the beginning of the structure: data type(array size) member name : member description
  38
  39 2.1 GENERAL ARCHIVE LAYOUT
  40 - Archive Header
  41 - File Data
  42 - File Data - Special Files
  43 - Hash Table
  44 - Block Table
  45 - Strong Digital signature
  46
  47 This is the usual archive format, and is not absolutely essential. Some archives have been observed placing the hash table
  48 and file table after the archive header, and before the file data.
  49
  50 2.2 ARCHIVE HEADER
  51 00h: char(4) Magic : Indicates that the file is a MoPaQ archive. Must be ASCII "MPQ" 1Ah.
  52 04h: int32 HeaderSize : Size of the archive header. Should be 32.
  53 08h: int32 ArchiveSize : Size of the whole archive, including the header. Does not include the strong digital signature, if present.
  54 This size is used, among other things, for determining the region to hash in computing the digital signature.
  55 0Ch: int16 Unknown : Unknown
  56 0Eh: int8 SectorSizeShift : Power of two exponent specifying the number of 512-byte disk sectors in each logical sector
  57 in the archive. The size of each logical sector the archive is 512 * 2^SectorSizeShift. Bugs in the Storm library dictate
  58 that this should always be 3 (4096 byte sectors).
  59 10h: int32 HashTableOffset : Offset to the beginning of the hash table, relative to the beginning of the archive.
  60 14h: int32 BlockTableOffset : Offset to the beginning of the block table, relative to the beginning of the archive.
  61 18h: int32 HashTableEntries : Number of entries in the hash table. Must be a power of two, and must be less than 2^16.
  62 1Ch: int32 BlockTableEntries : Number of entries in the block table.
  63
  64 The archive header is the first structure in the archive, at archive offset 0, but the archive does not need to be at offset
  65 0 of the containing file. The offset of the archive in the file is referred to here as ArchiveOffset. If the archive is not
  66 at the beginning of the file, it must begin at a disk sector boundary (512 bytes). Early versions of Storm require that the
  67 archive be at the end of the containing file (ArchiveOffset + ArchiveSize = file size), but this is not required in newer
  68 versions (due to the strong digital signature not being considered a part of the archive).
  69
  70 2.3 BLOCK TABLE
  71 The block table contains entries for each region in the archive. Regions may be either files or empty space, which may be
  72 overwritten by new files (typically this space is from deleted file data). The block table is encrypted, using the hash
  73 of "(block table)" as the key. Each entry is structured as follows:
  74
  75 00h: int32 BlockOffset : Offset of the beginning of the block, relative to the beginning of the archive. Meaningless if the block size is 0.
  76 04h: int32 BlockSize : Size of the block in the archive.
  77 08h: int32 FileSize : Size of the file data stored in the block. Only valid if the block is a file, otherwise meaningless, and should be 0. If the file is compressed, this is the size of the uncompressed file data.
  78 0Ch: int32 Flags : Bit mask of the flags for the block. The following values are conclusively identified:
  79         80000000h: Block is a file, and follows the file data format; otherwise, block is free space, and may be overwritten. If the block is not a file, all other flags should be cleared.
  80         01000000h: File is stored as a single unit, rather than split into sectors.
  81         00020000h: The file's encryption key is adjusted by the block offset and file size (explained in detail in the File Data section). File must be encrypted.
  82         00010000h: File is encrypted.
  83         00000200h: File is compressed. Mutually exclusive to file imploded.
  84         00000100h: File is imploded. Mutually exclusive to file compressed.
  85
  86 2.4 HASH TABLE
  87 Instead of storing file names, for quick access MoPaQs use a fixed, power of two-size hash table of files in the archive. A file is uniquely identified by its file path, its language, and its platform. The home entry for a file in the hash table is computed as a hash of the file path. In the event of a collision (the home entry is occupied by another file), progressive overflow is used, and the file is placed in the next available hash table entry. Searches for a desired file in the hash table proceed from the home entry for the file until either the file is found, the entire hash table is searched, or an empty hash table entry (FileBlockIndex of FFFFFFFFh) is encountered. The hash table is encrypted using the hash of "(hash table)" as the key. Each entry is structured as follows:
  88
  89 00h: int32 FilePathHashA : The hash of the file path, using method A.
  90 04h: int32 FilePathHashB : The hash of the file path, using method B.
  91 08h: int16 Language : The language of the file. This is a Windows LANGID data type, and uses the same values. 0 indicates the default language (American English), or that the file is language-neutral.
  92 0Ah: int8 Platform : The platform the file is used for. 0 indicates the default platform. No other values have been observed.
  93 0Ch: int32 FileBlockIndex : If the hash table entry is valid, this is the index into the block table of the file. Otherwise, one of the following two values:
  94         FFFFFFFFh: Hash table entry is empty, and has always been empty. Terminates searches for a given file.
  95         FFFFFFFEh: Hash table entry is empty, but was valid at some point (in other words, the file was deleted). Does not terminate searches for a given file.
  96
  97 2.5 FILE DATA
  98 00h: int32(SectorsInFile + 1) SectorOffsetTable : Offsets to the start of each sector's data, relative to the beginning of the file data. Not present if this information is calculatable (see details below).
  99 immediately following SectorOffsetTable: SectorData : Data of each sector in the file, packed end to end (see details below).
 100
 101 Normally, file data is split up into sectors, for simple streaming. All sectors, save for the last, will contain as many bytes of file data as specified in the archive header's SectorSizeShift; the last sector may be smaller than this, depending on the size of the file data. This sector size is the size of the raw file data; if the file is compressed, the compressed sector will be smaller or the same size as the uncompressed sector size. Individual sectors in a compressed file may be stored uncompressed; this occurs if and only if the sector could not be compressed by the algorithm used (if the compressed sector size was greater than or equal to the size of the raw data), and is indicated by the sector's compressed size in SectorOffsetTable being equal to the uncompressed size of the sector (which may be calculated from the FileSize).
 102
 103 If the sector is compressed (but not imploded), a bit mask byte of the compression algorithm(s) used to compress the sector is appended to the beginning of the compressed sector data. This additional byte counts towards the total size of the sector; if the size of the sector (including this byte) exceeds or matches the uncompressed size of the sector data, the sector will be stored uncompressed, and this byte omitted. Multiple compression algorithms may be used on the same sector; in this case, successive compression occurs in the order the algorithms are listed below, and decompression occurs in the opposite order. For implimentations of all of these algorithms, see StormLib.
 104         40h: IMA ADPCM mono
 105         80h: IMA ADPCM stereo
 106         01h: Huffman encoded
 107         02h: Deflated (see ZLib)
 108         08h: Imploded (see PKWare Data Compression Library)
 109         10h: BZip2 compressed (see BZip2)
 110
 111 If the file is stored as a single unit (indicated in the file's Flags), there is effectively only a single sector, which
 112 contains the entire file.
 113
 114 If the file is encrypted, each sector (after compression and appendage of the compression type byte, if applicable)
 115 is encrypted with the file's key. The base key for a file is determined by a hash of the file name stripped of the
 116 directory (i.e. the key for a file named "directory\file" would be computed as the hash of "file"). If this key is
 117 adjusted, as indicated in the file's Flags, the final key is calculated as ((base key + BlockOffset - ArchiveOffset)
 118 XOR FileSize) (StormLib - an open-source implementation of the MoPaQ reading and writing functions,
 119 by Ladislav Zezula - incorrectly uses an AND in place of the XOR). Each sector is encrypted using the key + the
 120 0-based index of the sector in the file. The SectorOffsetTable, if present, is encrypted using the key - 1.
 121
 122 The SectorOffsetTable is omitted when the sizes and offsets of all sectors in the file are calculatable from the FileSize.
 123 This can happen in several circumstances. If the file is not compressed/imploded, then the size and offset of all sectors
 124 is known, based on the archive's SectorSizeShift. If the file is stored as a single unit compressed/imploded, then the
 125 SectorOffsetTable is omitted, as the single file "sector" corresponds to BlockSize and FileSize, as mentioned previously.
 126 Note that the SectorOffsetTable will always be present if the file is compressed/imploded and the file is not stored as
 127 a single unit, even if there is only a single sector in the file (the size of the file is less than or equal to the
 128 archive's sector size).
 129
 130 2.6 LISTFILE
 131 The listfile is a very simple extension to the MoPaQ format that contains the file paths of (most) files in the archive.
 132 The languages and platforms of the files are not stored in the listfile. The listfile is contained in the file "(listfile)",
 133 and is simply a non-Unix-style text file with one file path on each line, lines terminated with the bytes 0Dh 0Ah. The file
 134 "(listfile)" may not be listed in the listfile.
 135
 136 2.7 EXTENDED ATTRIBUTES
 137 The extended attributes are optional file attributes for files in the block table. These attributes were added at times after
 138 the MoPaQ format was already finalized, and it is not necessary for every archive to have all (or any) of the extended attributes.
 139 If an archive contains a given attribute, there will be an instance of that attribute for every block in the block table, although
 140 the attribute will be meaningless if the block is not a file. The order of the attributes for blocks correspond to the order of the
 141 blocks in the block table, and are of the same number. The attributes are stored in parallel arrays in the "(attributes)" file,
 142 in the archive. The attributes corresponding to this file need not be valid (and logically cannot be). Unlike all the other
 143 structures in the MoPaQ format, entries in the extended attributes are NOT guaranteed to be aligned. Also note that in some
 144 archives, malicious zeroing of the attributes has been observed, perhaps with the intent of breaking archive viewers. This
 145 file is structured as follows:
 146
 147 00h: int32 Version : Specifies the extended attributes format version. For now, must be 100.
 148 04h: int32 AttributesPresent : Bit mask of the extended attributes present in the archive:
 149         00000001h: File CRC32s.
 150         00000002h: File timestamps.
 151         00000004h: File MD5s.
 152 08h: int32(BlockTableEntries) CRC32s : CRC32s of the (uncompressed) file data for each block in the archive. Omitted if the
 153 archive does not have CRC32s. immediately after CRC32s: FILETIME(BlockTableEntries) Timestamps : Timestamps for each block
 154 in the archive. The format is that of the Windows FILETIME structure. Omitted if the archive does not have timestamps.
 155 immediately after Timestamps: MD5(BlockTableEntries) MD5s : MD5s of the (uncompressed) file data for each block in the archive.
 156 Omitted if the archive does not have MD5s.
 157
 158 2.8 WEAK DIGITAL SIGNATURE
 159 The weak digital signature is a digital signature using Microsoft CryptoAPI. It is an implimentation of the RSASSA-PKCS1-v1_5
 160 digital signature protocol, using the MD5 hashing algorithm and a 512-bit (weak) RSA key (for more information about this
 161 protocol, see the RSA Labs PKCS1 specification). The public key and exponent are stored in a resource in Storm. The signature
 162 is stored uncompressed, unencrypted in the file "(signature)" in the archive. The archive is hashed from the beginning of the
 163 archive (ArchiveOffset in the containing file) to the end of the archive (the length indicated by ArchiveSize); the signature
 164 file is added to the archive before signing, and the space occupied by the file is considered to be all binary 0s during
 165 signing/verification. This file is structured as follows:
 166
 167 00h: int32 Unknown : Must be 0.
 168 04h: int32 Unknown : must be 0.
 169 08h: int512 Signature : The digital signature. Like all other numbers in the MoPaQ format, this is stored in little-endian order.
 170
 171 2.9 STRONG DIGITAL SIGNATURE
 172 The strong digital signature uses a simple proprietary implementation of RSA signing, using the SHA-1 hashing algorithm and
 173 a 2048-bit (strong) RSA key. The default public key and exponent are stored in Storm, but other keys may be used as well.
 174 The strong digital signature is stored immediately after the archive, in the containing file; the entire archive (ArchiveSize
 175 bytes, starting at ArchiveOffset in the containing file) is hashed as a single block. The signature has the following format:
 176
 177 00h: char(4) Magic : Indicates the presence of a digital signature. Must be "NGIS" ("SIGN" backwards).
 178 04h: int2048 Signature : The digital signature, stored in little-endian format.
 179
 180 When the Signature field is decrypted with the public key and exponent, and the result stored in little-endian order, it is structured as follows:
 181
 182 00h: byte Padding : Must be 0Bh.
 183 01h: byte(235) Padding : Must be BBh.
 184 ECh: byte(20) SHA-1 : SHA-1 hash of the archive, in standard SHA-1 format.
 185
 186 3. ALGORITHM SOURCE CODE
 187 3.1 ENCRYPTION/DECRYPTION
 188 I believe this was derived at some point from code in StormLib. Assumes the long type to be 32 bits, and the machine to be little endian order.
 189
 190 unsigned long dwCryptTable[0x500];
 191
 192 void InitializeCryptTable()
 193 {
 194     unsigned long seed   = 0x00100001;
 195     unsigned long index1 = 0;
 196     unsigned long index2 = 0;
 197     int   i;
 198
 199     for (index1 = 0; index1 < 0x100; index1++)
 200     {
 201         for (index2 = index1, i = 0; i < 5; i++, index2 += 0x100)
 202         {
 203             unsigned long temp1, temp2;
 204
 205             seed  = (seed * 125 + 3) % 0x2AAAAB;
 206             temp1 = (seed & 0xFFFF) << 0x10;
 207
 208             seed  = (seed * 125 + 3) % 0x2AAAAB;
 209             temp2 = (seed & 0xFFFF);
 210
 211             dwCryptTable[index2] = (temp1 | temp2);
 212         }
 213     }
 214 }
 215
 216 void EncryptData(void *lpbyBuffer, unsigned long dwLength, unsigned long dwKey)
 217 {
 218     unsigned long *lpdwBuffer = (unsigned long *)lpbyBuffer;
 219     unsigned long seed = 0xEEEEEEEE;
 220     unsigned long ch;
 221
 222         assert(lpbyBuffer);
 223
 224     dwLength /= sizeof(unsigned long);
 225
 226     while(dwLength-- > 0)
 227     {
 228         seed += dwCryptTable[0x400 + (dwKey & 0xFF)];
 229         ch = *lpdwBuffer ^ (dwKey + seed);
 230
 231         dwKey = ((~dwKey << 0x15) + 0x11111111) | (dwKey >> 0x0B);
 232         seed = *lpdwBuffer + seed + (seed << 5) + 3;
 233
 234                 *lpdwBuffer++ = ch;
 235     }
 236 }
 237
 238 void DecryptData(void *lpbyBuffer, unsigned long dwLength, unsigned long dwKey)
 239 {
 240         unsigned long *lpdwBuffer = (unsigned long *)lpbyBuffer;
 241     unsigned long seed = 0xEEEEEEEE;
 242     unsigned long ch;
 243
 244         assert(lpbyBuffer);
 245
 246     dwLength /= sizeof(unsigned long);
 247
 248     while(dwLength-- > 0)
 249     {
 250         seed += dwCryptTable[0x400 + (dwKey & 0xFF)];
 251         ch = *lpdwBuffer ^ (dwKey + seed);
 252
 253         dwKey = ((~dwKey << 0x15) + 0x11111111) | (dwKey >> 0x0B);
 254         seed = ch + seed + (seed << 5) + 3;
 255
 256                 *lpdwBuffer++ = ch;
 257     }
 258 }
 259
 260 3.2 HASHING
 261 Based on code from StormLib.
 262
 263 // Different types of hashes to make with HashString
 264 #define MPQ_HASH_TABLE_OFFSET   0
 265 #define MPQ_HASH_NAME_A 1
 266 #define MPQ_HASH_NAME_B 2
 267 #define MPQ_HASH_FILE_KEY       3
 268
 269 unsigned long HashString(const char *lpszString, unsigned long dwHashType)
 270 {
 271     unsigned long  seed1 = 0x7FED7FED;
 272     unsigned long  seed2 = 0xEEEEEEEE;
 273     int    ch;
 274
 275     while (*lpszString != 0)
 276     {
 277         ch = toupper(*lpszString++);
 278
 279         seed1 = dwCryptTable[(dwHashType * 0xFF) + ch] ^ (seed1 + seed2);
 280         seed2 = ch + seed1 + seed2 + (seed2 << 5) + 3;
 281     }
 282     return seed1;
 283 }
 284
 285 3.3 CONVERSION OF FILETIME AND time_t
 286
 287 #define EPOCH_OFFSET 116444736000000000ULL      // Number of 100 ns units between 01/01/1601 and 01/01/1970
 288
 289 bool GetTimeFromFileTime(FILETIME &fileTime, time_t &time)
 290 {
 291         // The FILETIME represents a 64-bit integer: the number of 100 ns units since January 1, 1601
 292         unsigned long long nTime = ((unsigned long long)fileTime.dwHighDateTime << 32) + fileTime.dwLowDateTime;
 293
 294         if (nTime < EPOCH_OFFSET)
 295                 return false;
 296
 297         nTime -= EPOCH_OFFSET;  // Convert the time base from 01/01/1601 to 01/01/1970
 298         nTime /= 10000000ULL;   // Convert 100 ns to sec
 299
 300         time = (time_t)nTime;
 301
 302         // Test for overflow (FILETIME is 64 bits, time_t is 32 bits)
 303         if ((nTime - (unsigned long long)time) > 0)
 304                 return false;
 305
 306         return true;
 307 }
 308
 309 void GetFileTimeFromTime(time_t &time, FILETIME &fileTime)
 310 {
 311         unsigned long long nTime = (unsigned long long)time;
 312
 313         nTime *= 10000000ULL;
 314         nTime += EPOCH_OFFSET;
 315
 316         fileTime.dwLowDateTime = (DWORD)nTime;
 317         fileTime.dwHighDateTime = (DWORD)(nTime >> 32);
 318 }